The data for this analysis was collected by the United Nation’s World Food Programme’s vulnerability analysis and is updated monthly. The dataset can be retrieved from here ——
After loading the dataset into R I immediately realized the magnitude of it all. 687,253 objects of 18 variables which included 74 countries 304 different types food, all with a price in one of 60 currencies. I also noticed with so many different objects that there would be a lot of cleaning that needed to be done before I even started my analysis. Since I did not know what I was going to find I made sure to keep my cleaning phase as broad as possible by retaining as many objects as possible.
One think to note about thus data set is most of the data is from Africa.
I began by renaming columns to be easier to view by renaming columns and removes ones that would not be useful. I also originally had a year and a month column I wrote a function to merge the to make a date column, but I retained bother year and month columns
To add some more information I country code column by downloading an existing data frame and performed a left join on the to so every country had its appropriate code. This also created a region and income group column. I Performed the same join but this time to categorize food groups. Example fruits, meat, and bread. This was when I found that some entries were not food such as fuel and manual labor. I left them in but grouped them as “Not Food.”
The hardest part of the cleaning was trying to unify all the price because I had 60 unique currencies over the last two decades. So not only was exchange rate a factor so was inflation and deflation. To overcome this I downloaded a data set of PPP factors from the World Bank. This would account for both inflation and exchange rate. Once I had the PPP data set a wrote a function to match every country and year with its appropriate PPP factor. Once I had a column with PPP factor, I then could calculate a unified price column by dividing the original price by the new PPP facto column. Unfortunately the PPP data set did not have a PPP factor for every country and every year, so these entries were deleted.
After this, I thought my analysis could begin, but I quickly realized that I would have to unify the unit column as well. I had some foods in KG, G, Pounds, Gallon, ML, L and even Marmite and Cuartilla. So I converted everything to G L or Units. From here I divided my unified price by units to calculate a price per one unit column.
To finally narrow my focus I took a look at them most common foods. This revealed that the top 10 most common foods in order were Maize, Millet, Sorghum, Rice (imported), Rice, Maize (white), Rice (local), Wheat, Sugar and Wheat flour. I had a substantial amount of entries, but I wanted to unify all types of the same food. So I regrouped the foods to be more specific and was lift with substantially more values to analyzing. This left me with much more values in my top 10, and I decide to only focus of foods with over 25,000 entries, so this left me with Rice, Maize, Sorghum, Beans, Millet and oil Oil. I did not rename them because I wanted to be able to distinguish between local and imported foods.
Now it was finally time to start my analysis. At least I thought. I grouped the top 6 foods into 6 data frames, and the first thing I tried to do was plot a line chart and noticed something strange. It was not a traditional line graph so I investigated my data and realized that for some countries I only had the national average and others I had multiple markets meaning I had multiple prices for the same day which would not graph properly. So back to cleaning. I wrote a function that would group calculate the national average for each of the in the top 6 food group data frames. One thing to note is that this is not a true representation of a national average because I assumed that all the markets are weighted equally which in reality is not true. I additionally wrote a function to calculate import vs. local, regional and global averages.
I started the analysis with the food that had the most entries which was rice with 67,003 entries. With each food I started the analysis with a worldwide view and then narrowed it down to a six regioins and then narroed it further to specific countries.
Lets take a look at how the price of rice has changed over they last ten years by taking the average price for each day and plotting a line chart.
As you can see the price has overall increased. in the last decade Notice how there is a spike in the price from 2006 - 2007
To look at the overall inflation of rice, I plotted monthly inflation as a histogram
It looks like there is a reletively normal distribution, but there are a few outliers. The bin way over in 100 is Liberia, which has an extremely high inflation rate.
Lets narrow our analysis slightly by looking at rice in different regions of the world. First by plotting them all on the same line chart.
You can see that a East Asia & Pacific, South Asia, Latin America & Caribbean and finallu Middle East & North Africa all seem to have a similary trend. While Europe & Central Asia has a slighty higher price. Most notably is Sub-Saharan Africa which has multiple spikes in price. Unfortunately you can see that we do not have a complete time series for all regions.
But lets take a closer look at all the regions by faceting each chart.
After faceting, you can see how sporadic Sub-Saharan Africa is. This region was probably the reason for those spike from 2001 - 2007 when all regions were plotted together. This makes since because Africa is prone to food price fluctuation due to shortages, poverty and local conflict. In East Asia & Pacific we see the price drop for a few years but then start to pick up again. (find why was this during a good growing season). This looks the same for the Middle East & North Africa, but we do not have enough data to see if this was a trend. Then for South Asia, you can see that it has the most controlled price which makes sense because over 90 percent of the world’s rice is produced and consumed in the Asia-Pacific Region which would lead to cheaper prices for locals. (http://www.fao.org/docrep/003/x6905e/x6905e04.htm)
Price only tells half of the story. Looking at inflation for each region will show which regions stable and unstable. To do this I created a box plot.
No surprise that Sub-Saharan Africa looks like the most unstable region with the most significant outliers. After investigating the data I found that these outliers are cause by Liberia in 2006 and Rwanda in 2015. You can also see Latin America & Caribbean has the smallest quartile range which means it is one of the most stable regions for rice price. (find why)
So far we have found that from 2006 to 2007 there was a spike in rice food prices worldwide. We also know that Sub-Saharan Africa has the most high and inconsistent prices. This is the same for inflation where we see that Sub-Saharan Africa is the least stable and Latin America & Caribbean is the most stable.
To further investigate these findings we will analyse the price of rice in every country available. (once I know I have the correct graphs I will investigate the countries)
Next lets take a look at how inflation vaires from country to country. (are histo grams the best to do this, like price once I have the correct graphs I will investigate the countris)
To see all the compare all the countries inflation we can plot them on a box plot once again.
Since rice has so many values, we have enough rice classified as import and local. This allows us to compare the price of local and imported rice to see if there is a correlation between the two.
As you can see in every country imported local and not listed follow the same trend with just a slight increase or decrease in price. Or can even see see were they have the exact same price where the purple line is.
But lets see if inflation for import and export has a significant difference.
It looks like Chad and Mali have an identical distrobution for both Import and Not Listed which leads me to believe that rice that I catorgized as not listed may be imported rice. Additionally for both these countries Local rice has a much less stable price which makes since because of seasonal crops. Mali has three growing seasons, Main Oct-Dec, Off Dec-Jan and Deepwater rice May-Jul (http://ricepedia.org/mali). Then Chad has two season, main season Oct - Dec and off seasson June - July. (http://www.fao.org/docrep/005/Y4347E/y4347e0f.htm) Off season rice has to be grown in well irrigated areas.
The last analysis I will do for rice is the price to survive. Which is a price for a years worth of rice which is equiliveny to 1000 calories of rice a day. The calculated that 1000 calories of rice is about 769.23076923 G. #```{r, echo=FALSE, fig.width=14, message=FALSE, warning=FALSE} price_to_survive_plot_line(rice_price_to_survive) price_to_survive_plot_bar(rice_price_to_survive)
price_to_survive_plot_line(rice_price_to_survive_no_lib_nig)#Took out nigeria and liberia price_to_survive_plot_bar(rice_price_to_survive_no_lib_nig) #```
World Price
World Infaltion
Regional Price line
Regional Price Matrix
Regional Inflation
Short summary
Countr Price Matrixs
Countr Inflation Matrixs
Inflation Country Box
World Price
World Infaltion
Regional Price line
Regional Price Matrix
Regional Inflation
Short summary
Countr Price Matrixs
Countr Inflation Matrixs
Inflation Country Box
World Price
World Infaltion
Regional Price line
Regional Price Matrix
Regional Inflation
Short summary
Countr Price Matrixs
Countr Inflation Matrixs
Inflation Country Box
World Price
World Infaltion
Regional Price line
Regional Price Matrix
Regional Inflation
Short summary
Countr Price Matrixs
Countr Inflation Matrixs
Inflation Country Box
World Price
World Infaltion
Regional Price line
Regional Price Matrix
Regional Inflation
Short summary
Countr Price Matrixs
Countr Inflation Matrixs
Inflation Country Box